Skip to content

[LLADA2] Fix llada2 review #13598#13698

Merged
kashif merged 5 commits into
huggingface:mainfrom
kashif:fix-llada2-review-13598
May 17, 2026
Merged

[LLADA2] Fix llada2 review #13598#13698
kashif merged 5 commits into
huggingface:mainfrom
kashif:fix-llada2-review-13598

Conversation

@kashif
Copy link
Copy Markdown
Contributor

@kashif kashif commented May 8, 2026

What does this PR do?

Fix the issues raised in #13598

Fixes #13598

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

kashif added 2 commits May 8, 2026 09:12
Fixes the six in-scope issues raised in the llada2 model/pipeline review:

1. Carry tokenizer `attention_mask` through `_prepare_input_ids` and add an
   `attention_mask` arg to `__call__` for pre-tokenized inputs. The runtime
   mask now reflects prompt padding and zeros out the block-aligned tail
   past `prompt_length + gen_length` instead of treating those positions
   as valid context.

2. Thread the per-call `block_length` into `BlockRefinementScheduler.set_timesteps`
   so the transfer schedule matches the requested block size (previously the
   scheduler only read its constructor default).

3. Drop `x0`/`x0_p`/`confidence` from `_callback_tensor_inputs` (never bound
   locals) and bind `sampled_tokens`, `sampled_probs`, `editing_transfer_index`,
   `active_block` so all advertised callback keys resolve.

4. Allow EOS exactly at index `prompt_length` (the first generated position)
   to mark a row finished.

5. Freeze rows that have already emitted EOS so subsequent block refinement
   doesn't extend them, and trim per-row at decode (previously gated on
   batch_size==1) so post-EOS positions don't leak into decoded text.

6. Stop calling `self.set_progress_bar_config(...)` from inside `__call__`;
   build a local config dict for the inner block bar so user-supplied flags
   (in particular `disable=True`) survive the call.

Adds regression tests pinning each of the six fixes.
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions github-actions Bot removed the models label May 9, 2026
@kashif kashif requested a review from dg845 May 10, 2026 15:39
Comment thread src/diffusers/pipelines/llada2/pipeline_llada2.py
Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left one comment.

@kashif kashif merged commit 79de306 into huggingface:main May 17, 2026
13 of 15 checks passed
Enderfga pushed a commit to Enderfga/diffusers that referenced this pull request May 19, 2026
* [LLaDA2] address review findings from huggingface#13598

Fixes the six in-scope issues raised in the llada2 model/pipeline review:

1. Carry tokenizer `attention_mask` through `_prepare_input_ids` and add an
   `attention_mask` arg to `__call__` for pre-tokenized inputs. The runtime
   mask now reflects prompt padding and zeros out the block-aligned tail
   past `prompt_length + gen_length` instead of treating those positions
   as valid context.

2. Thread the per-call `block_length` into `BlockRefinementScheduler.set_timesteps`
   so the transfer schedule matches the requested block size (previously the
   scheduler only read its constructor default).

3. Drop `x0`/`x0_p`/`confidence` from `_callback_tensor_inputs` (never bound
   locals) and bind `sampled_tokens`, `sampled_probs`, `editing_transfer_index`,
   `active_block` so all advertised callback keys resolve.

4. Allow EOS exactly at index `prompt_length` (the first generated position)
   to mark a row finished.

5. Freeze rows that have already emitted EOS so subsequent block refinement
   doesn't extend them, and trim per-row at decode (previously gated on
   batch_size==1) so post-EOS positions don't leak into decoded text.

6. Stop calling `self.set_progress_bar_config(...)` from inside `__call__`;
   build a local config dict for the inner block bar so user-supplied flags
   (in particular `disable=True`) survive the call.

Adds regression tests pinning each of the six fixes.

* fix formatting

* undo changes

* set block_length to optional and use scheduler's default

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llada2 model/pipeline review

3 participants